how_tofandomcom-20200214-history
How to use regular expressions(regex) for pattern matching
In Regular Expressions a pattern match is denoted by /Pattern/ or m/pattern/ =characters= Meta characters * *''' matches 0 or more of previous expression. * '''+ matches 1 or more of previous expression. * ?' matches 0 or 1 of previous expression; also forces minimal matching when an expression might match several strings within a search string. * '. matches Any character (except \n newline) * ( ) matches Logical grouping of part of an expression. * [ ] matches Explicit set of characters to match. * { } matches Explicit quantifier notation. * \''' matches Preceding one of the above, it makes it a literal instead of a special character. Preceding a special matching character, see below. * '''/ matches * |''' matches * '''^ matches Beginning of a string. * $''' matches End of a string. literal characters characters Classes * '''. matches any character except new line * aeiou matches any character in the specified set * ^aeiou matches any character not in the specified set * 0-9a-eA-E matches any character in the range of char before the hyphen and after the hyphen. In this example it would match any char between(and including) 0 thru 9 or lowercase a thru f or uppercase A thru F. Equivalent to 01234565789abcdeABCDE * \p{name} matches any character in the named character class specified by {name}. Supported names are Unicode groups and block ranges. For example, Ll, Nd, Z, IsGreek, IsBoxDrawing. * \P{name} matches text not included in groups and block ranges specified in {name}. * \w matches any word character. Equivalent to the Unicode character categories \p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Pc}. * \W matches Matches any nonword character. Equivalent to the Unicode categories ^\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Pc} * \s matches any white-space character. Equivalent to the Unicode character categories \f\n\r\t\v\x85\p{Z} * \S matches any non-white-space character. Equivalent to the Unicode character categories ^\f\n\r\t\v\x85\p{Z} * \d Matches any decimal digit. Equivalent to \p{Nd} for Unicode and 0-9 for non-Unicode, ECMAScript behavior. * \D matches any nondigit. Equivalent to \P{Nd} for Unicode and ^0-9 for non-Unicode * \b A word boundary, the spot between word (\w) and non-word (\W) characters /\bfred\b/i matches Fred but not Alfred or Frederick POSIX Character Classes * :alnum: matches alphanumeric character :alnum:{3} matches any three letters or numbers, like 7Ds * :alpha: alphabetic character, any case :alpha:{5} matches five alphabetic characters, any case, like aBcDe * :blank: ' matches space and tab :blank:{3,5} matches any three, four, or five spaces and tabs * ':digit: ' matches digits :digit:{3,5} matches any three, four, or five digits, like 3, 05, 489 * ':lower: ' matches lowercase alphabetics :lower: matches a but not A * ':punct: ' matches punctuation characters :punct: matches ! or . or, but not an or 3 * ':space: matches all whitespace characters, including newline and carriage return :space: matches any space, tab, newline, or carriage return * :upper: matches uppercase alphabetics :upper: matches A but not Meta characters *'\t ' matches tab (HT, TAB) *'\n ' matches newline (LF, NL) *'\r ' matches return (CR) *'\f ' matches form feed (FF) *'\a ' matches alarm (bell) (BEL) *'\e ' matches escape (think troff) (ESC) *'\033 ' matches octal charcters (think of a PDP-11) *'\x1B ' matches hex characters]] *'\x{263a}' matches wide hex characters (Unicode SMILEY) *'\c[ ' matches control characters *'\N{name}' matches named characters *'\l' matches lowercase next char (think vi) *'\u ' matches uppercase next char (think vi) *'\L ' matches lowercase till \E (think vi) *'\U ' matches uppercase till \E (think vi) *'\E' matches end case modification (think vi) *'\Q' matches quote (disable) pattern meta characters till \E Repetitions Oborators * *''' matches Match 0 or more times * '''+ matches Match 1 or more times * ' ?' matches Match 1 or 0 times * ' {n}' matches Match exactly n times * ' {n,}' matches Match at least n times * ' {n,m}' matches Match at least n but not more than m times Anchoring Operators * ^''' matches match must start the beginning of the line. example ^foo * ''' $ matches match must start the beginning of the line. Word Operators * ' \b' matches string at either the beginning or the end of a word. For example, `\brat\b' matches the separate word `rat'. * ' \B' matches string within a word. For example, `c\Brat\Be' matches `crate', but `dirty \Brat' doesn't match `dirty rat'. * ' \<' matches string at the beginning of a word * ' \>' matches string at the end of a word. * ' \w' matches any word-constituent character * ' \W' matches any character that is not word-constituent. Buffer Operators Following are operators which work on buffers. In Emacs, a buffer is, naturally, an Emacs buffer. For other programs, Regex considers the entire string to be matched as the buffer. * ' \`' matches a string at the beginning of the buffer * ' \' ' matches a string at the end of the buffer Greedy Wildcards and Repetitions * ' ?' Match 0 or more times * ' +?' Match 1 or more times * ' ??' Match 0 or 1 time * ' {n}?' Match exactly n times * ' {n,}?' Match at least n times * ' {n,m}?' Match at least n but not more than m times =Groups, List = * ( ) group operator ** example (cat|hat) matched cat or hat * [ ] class operator ** example jfet matches j or f or e or t From HowTo Wiki, a Wikia wiki. Category:Howto